Method and sequences for determinate nucleic acid hybridization

ABSTRACT

Provided are methods for using nucleic acid sequences having two or more degenerately pairing nucleotides, each degenerate nucleotide having a partially overlapping set of complementarity, to reduce the number of hybridizing nucleotide sequences or probes used in biochemical and molecular biological operations having sequence specific hybridization. The method may be employed for various hybridization procedures with sequence specific hybridization, including sequencing methods measuring hybridization directly, and tagging by hybridization methods in which the sequence is determined by analyzing the pattern of tags that hybridize thereto, and hybridization dependent amplification methods. The method involves hybridizing to the nucleic acid sequence of interest a first hybridizing nucleotide sequence and a second hybridizing nucleotide sequence, each comprising a sequence complementary, or complementary except at a position of interest or variable position, to a nucleic acid sequence of interest, and analyzing the whether some, all or none of the probes or tags hybridize.

FIELD OF THE INVENTION

[0001] The present invention is directed to a method and nucleic acid sequences for determinate hybridization of nucleic acid analytes using hybridization probe sets.

BACKGROUND OF THE INVENTION

[0002] The ability to detect specific target nucleic acid analytes using nucleic acid probe hybridization and nucleic acid amplification methods has many applications. These applications include: nucleic acid sequencing, diagnoses of infectious or genetic diseases or cancers in humans or other animals; identification of viral or microbial contamination in cosmetics, foods, pharmaceuticals or water; and identification or characterization of, or genetic discrimination between individuals, for diagnosis of disease and genetic predisposition to disease, forensic or paternity testing and genetic analyses, for example breeding or engineering stock improvements in plants and animals.

[0003] The basis of nucleic acid probe hybridization methods and applications is the specific hybridization of an oligonucleotide or a nucleic acid fragment probe to form a stable, double-stranded hybrid through complementary base-pairing to particular nucleic acid sequence segments in an analyte molecule. Particular nucleic acid sequences may occur in only cells from a species, strain, individual or organism. Sequence specific hybridization of oligonucleotides and their analogs is a fundamental biotechnological process employed in various research, medical, and industrial applications. Specific hybridization by base pairing complementarity is utilized, for example, in identification of disease-related polynucleotides in diagnostic assays, screening of clones for polynucleotides containing a sequence of interest, identification of specific polynucleotides in mixtures of polynucleotides, amplification of specific target polynucleotides by, for example, polymerase chain reaction (PCR) and replicase enzyme mediated techniques, hybridization based histologic tissue staining, as in in situ PCR staining for histopathology, therapeutic blocking of expressed mRNA by anti-sense sequences, and DNA sequencing. For descriptions of these and other methods see for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2^(nd) Edition, Cold Spring Harbor Laboratory, New York; Keller and Manak, DNA Probes (1993) 2^(nd) Edition, Stockton Press, New York; Milligan et al. (1993) J. Med. Chem. 36:1923-1937; Drmanac et al. (1993) Science 260:1649-52; Bains (1993) J. DNA Sequencing and Mapping 4: 143-50; U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis et al; and U.S. Pat. Nos. 4,483,964 and 4,517,338 to Urdea et al.

[0004] Base pairing specific hybridization has been proposed as a method of tracking, retrieving, and identifying compounds labeled with oligonucleotide tags. For example, in multiplex DNA sequencing, oligonucleotide tags are used to identify electrophoretically separated bands on a gel that consist of DNA fragments generated in the same sequencing reaction. DNA fragments from multiple sequencing reactions are thus separated on the same lane of a gel that is then blotted with separate solid phase materials on which the fragment bands from individual sequencing reactions are separately visualized by use of oligonucleotide probes that hybridize to complementary tags specific to the individual reaction (Church et al. (1988) Science 240: 185-88). Other uses of oligonucleotide tags or labels identifiable by hybridization based amplification have been proposed for identifying explosives, potential pollutants, such as crude oil, and currency for prevention and detection of counterfeiting. Dollinger reviews these methods, pages 265-274, in Mullis et al., Ed. (1994) The Polymerase Chain Reaction Birkhauser, Boston. More recently, systems employing oligonucleotide tags have also been proposed as a means of labeling, manipulating and identifying individual molecules in complex combinatorial chemical libraries, for example, as an aid to screening such libraries for drug candidates, Brenner and Lerner (1992) Proc. Natl. Acad. Sci. 89:5381-83; Alper (1994) Science 264:1399-1401; and Needels et al. (1993) Proc. Natl. Acad. Sci. 90: 10700-704.

[0005] Recombinant DNA technology has permitted amplification and isolation of short fragments of genomic DNA (from 200 to 500 bp) to obtain a sufficient quantity of material for determination of the nucleotide sequence from a cloned fragment. The sequence is then determined.

[0006] Distinguishing among the four nucleotides was historically achieved in two ways: (1) by specific chemical degradation of the DNA fragment at specific nucleotides, in accordance with the Maxam and Gilbert method (Maxam, A. M. and Gilbert, W. (1977) Proc. Natl. Acad. Sci. 74:560); or (2) utilizing the dideoxy sequencing method described by Sanger (Sanger, F., et al. (1977) Proc. Natl. Acad. Sci. 74:5463). The dideoxy sequencing method of Sanger results in termination of polymerization at polymer sequence positions that incorporate the specific dideoxy base instead of the corresponding deoxy base, a probabilistic event, which generates sequence segments of different length. The length of these dideoxy terminated sequence segments is determined by separation on polyacrylamide gels that separate DNA fragments in the range of 1 to 500 bp, differing in length by one nucleotide or more. The length of the terminated nucleotide sequence segments for a reaction employing the dideoxy analog of a given base indicates the positions in the sequence of interest occupied by that base.

[0007] Both preceding methods are laborious, with competent laboratories able to sequence approximately 100 bp per person per day. With the use of computers and robotics, sequencing can be accelerated by several orders of magnitude.

[0008] Sequencing the entire human genome has been widely discussed. Generally appreciated is that such is possible only in large organized centers at a cost on the order of billions of dollars, and would require at least ten years. For accuracy, three lengths of a genome must be sequenced, because of random formation of cloned fragments of about 500 bp. 10 billion bp could be sequenced in approximately 30 years in a center sequencing about a million base pairs per day. Ten such centers would be required to sequence the entire human genome in several years.

[0009] A desire for understanding the genetic basis of disease and a host of other physiological states associated with different gene expression patterns has motivated the development of several approaches to large-scale DNA analysis (Adams et al., Ed. (1994) Adams DNA Sequencing and Analysis, Academic Press, New York). Contemporary analysis techniques for patterns of gene expression include large-scale sequencing, differential display, indexing schemes, subtraction hybridization, hybridization with solid phase arrays of cDNAs or oligonucleotides, and numerous DNA fingerprinting techniques. See, e.g., Lingo et al. (1992) Science 257:967-71; Erlander et al. PCT Pat. App. No. PCT/US94/13041; McClelland et al, U.S. Pat. No. 5,437,975; Unrau et al. Gene (1994) 145:163-69; Schena et al. (1995) Science 270: 467-469; Velculescu et al. (1995) Science 270:484-86.

[0010] These methods may be grouped into sequencing by direct analysis of hybridization data per se, and methods that label or tag a sequence segment by hybridization. One important subclass of the tag or label group of techniques employs double stranded oligonucleotide adaptors to classify populations of polynucleotides and/or to identify nucleotides at the termini of polynucleotides, e.g. Unrau et al (1994) supra and U.S. Pat. No. 5,508,169; Sibson, PCT Pat. App. Nos. PCT/GB93/01452 and PCT/GB95/00109; Cantor, U.S. Pat. No. 5,503,980; and Brenner, PCT Pat. App. No. PCT/US95/03678 and U.S. Pat. No. 5,552,278. Adapters employed in the preceding techniques typically have protruding single strands that permit specific hybridization and ligation to polynucleotides having complementary single stranded ends (“sticky overhangs”). Identification or classification may be effected by carrying out the reactions in separate vessels, or by providing secondary labels which identify one or more nucleotides in the protruding strand of the ligated adaptor, for example by hybridization.

[0011] Successful implementation of such tagging schemes depends in large part on the success in achieving specific hybridization between analyte sequence and the adaptor-tag, and between a tag or primary probe and its complementary or secondary probe.

[0012] In techniques employing base pairing specific nucleic acid hybridization in general, including sequencing by hybridizing tags or labels, for an oligonucleotide tag to successfully identify a substance, the number of false positive and false negative signals must be minimized. Unfortunately, such spurious signals are not uncommon because base pairing and stacking free energies vary widely among nucleotides in a duplex or triplex hybridized structure. Duplexes consisting of a repeated sequence of deoxyadenosine (A) and deoxythymidine (T) (or the RNA analogs, adenosine and thymidine) bound to its complementary nucleic acid sequence, are typically less stable than an equal-length duplex consisting of a repeated sequence of deoxyguanosine (G) and deoxycytidine (C) bound to a complementary or even partially complementary target containing a mismatch. The preceding is widely appreciated, explaining the higher melting temperature (T_(m)) of GC rich double stranded (DS) sequences compared to DS AT rich sequences. Thus, if a desired compound from a large combinatorial chemical library were tagged with the former oligonucleotide, a significant possibility would exist that under hybridization conditions designed to detect perfectly matched AT-rich duplexes, undesired compounds labeled with the GC-rich oligonucleotide—even in a mismatched duplex—would be detected along with the perfectly matched duplexes consisting of the AT-rich tag.

[0013] In the molecular tagging system proposed by Brenner et al. supra, the related problem of mis-hybridizations of closely related (ie. Sequentially homologous) tags was addressed by employing a so-called “comma-less” code, which ensures that a probe out of register (or frame shifted) with respect to its complementary tag would result in a duplex with one or more mismatches for each of its five or more three-base words, or “codons.” Although reagents, such as tetramethylammonium chloride, are available to negate base-specific stability differences of oligonucleotide duplexes, their effect is often limited and their presence may be incompatible with, or may practically complicate, further manipulations of the hybridized complexes, e.g. amplification by polymerase chain reaction (PCR), or the like.

[0014] Analogous problems have unduly complicated the simultaneous use of multiple hybridization probes, for example in analysis of multiple or complex genetic loci, e.g. via multiplex PCR, reverse dot blotting, or the like, or simply in “two-color” hybridization. Therefore, direct sequencing of certain loci, e.g. HLA genes, is advocated as a reliable alternative to indirect methods employing specific hybridization for the identification of genotypes, see, e.g., Gyllensten et al. (1988) Proc. Natl. Acad. Sci. 85:7652-56.

[0015] There remains a need in the art for methods for systematically employing a smaller number of hybridizing nucleic acid sequences, while obtaining the same amount of information from the hybridization. There also remains a need to reduce the differences in base pairing energies, especially at sequence positions of interest between different pairs of complementary nucleotide bases.

[0016] When hybridization based sequencing, regardless of the specific type is the assay at hand, a larger number of hybridizing probes is required than in processes that employ hybridization for detection by amplification such as PCR based methods.

[0017] There remains a need for a method for streamlining the number of probes and experiments required for processes that involve hybridization, and especially for sequencing by hybridization methods, while maintaining these processes as determinate or sequence specific.

SUMMARY OF THE INVENTION

[0018] A method is provided for using nucleic acid sequences or sets of sequences having one or more degenerately pairing nucleotide sequence positions, these positions corresponding to a probed or variable position or position of interest, either by use of a degenerately pairing nucleotide or by use of two different nucleotides at the position, wherein each degenerate nucleotide position has a partially overlapping set of complementarity to reduce the number of hybridizing nucleotide sequences or probes used in biochemical and molecular biological operations employing sequence specific hybridization. The method may be employed for various hybridization procedures in which sequence specific hybridization occurs, including sequencing methods that measure hybridization directly, as by array based methods that analyze hybridization patterns and by tagging by hybridization methods in which the sequence is determined by the tagged nucleic acid sequences that hybridize thereto. The invention may also be employed in conjunction with hybridization dependent amplification methods.

[0019] The invention provides a method of reducing the required number of unique hybridizing sequences that may be used to hybridize to a nucleic acid sequence of interest under hybridizing conditions. The method involves hybridizing to the nucleic acid sequence of interest a first hybridizing nucleotide sequence and a second hybridizing nucleotide sequence, each hybridizing nucleotide sequence comprising a sequence segment complementary, or complementary except at a position of interest or probed position which comprises the position pairing to degenerately pairing nucleotide, to a nucleic acid sequence of interest. Additional probes or hybridizing nucleotide sequences are required if there are more than four nucleotides that may be present at the variable position or position of interest. For four possible nucleotides in a sequence, two nucleic acid hybridizing sequences are required each having a nucleotide base pairing to a set of two nucleotides at the variable position, the two sets overlapping in one nucleotide, which is common to both sets.

[0020] The position of the first hybridizing nucleotide sequence probe corresponding to the variable or probed position comprises a nucleotide base pairing with a first set of two or more nucleotides, and the position of the second hybridizing nucleotide corresponding to the variable position comprises a nucleotide base pairing with a second set of two or more nucleotides. The first set of two or more nucleotides present in the analyte nucleic acid sequence includes at least one nucleotide that is a member of the second set of two or more nucleotides present in the nucleic acid sequence. The base pairing sets comprising the first set of two or more nucleotides and the second set of two or more nucleotides are not identical. A nucleotide present in the nucleic acid sequence of interest is not represented in the first base pairing set of two or more nucleotides, and the same nucleotide not represented in the first set of two or more nucleotides is also not present in the second base pairing set of two or more nucleotides. The conditions are such that hybridization of each of the first and second hybridizing nucleotide sequences occurs only if complementarity exists between a nucleotide at the variable position of the sequence of interest and a nucleotide at the corresponding position of these hybridizing nucleotide sequences.

[0021] Depending upon the identity of the nucleotide at the variable position of the sequence of interest, one both or neither of the first hybridizing nucleotide sequence and the second hybridizing nucleotide sequence hybridize to the sequence of interest. The probes or hybridizing sequences having a multiply base pairing nucleotide may be simultaneously, sequentially or separately hybridized to the nucleic acid sequence of interest, to which they are to be hybridized.

[0022] Provided for the determinate use of degenerately complementary nucleotides having overlapping base pairing complementarity sets, are check probes comprising nucleotides complementary to a nucleotide present in no degenerate base pairing set. These check probes establish that a failure of hybridization is indeed because of the presence at the relevant position(s) in the sequence of interest of the nucleotide not represented in any of the degenerately pairing nucleotide complementarity sets. An ultimate check probe, a null hybridizing sequence that is complementary at all probed for positions of a probed segment to the unrepresented complementarity of the overlapping degenerate complementarity sets comprises a nucleic acid sequence complementary to that segment and does not hybridize to any of the degenerately hybridizing probes. By having the nucleotide represented in none of the sets of nucleotides pairing to the null probe at all the tested positions in the segment, the presence of a nucleic acid sequence to which none of the primary, degenerately pairing, probes hybridizes is established. When only one of the tested or variable positions in a sequence segment is probed, the failure to hybridize of those probes having the multiply pairing nucleotides at one of the variable positions probed in the nucleic acid sequence segment indicates the presence, in the probed sequence position, of the nucleotide not present in the overlapping degenerately base pairing sets.

[0023] Any hybridization dependent attribute of a system may be determinately followed or studied by the method of the invention using a reduced number of hybridizing sequences or probes. The ability to streamline the process or experiment and derive the same quantum of information may be employed in both direct hybridization based sequencing and in sequencing by hybridizing tags that carry a label such as a label nucleotide sequence or a fluorescent or other spectroscopically or otherwise distinguishable moiety. Enzymatic amplifications requiring hybridized nucleic acid probes, including PCR primers and the like, may be studied. Methods of studying nucleic acid hybridization in living cells may also employ degenerate base pairing probes having incompletely overlapping complementarity sets.

[0024] Typically, for four possible nucleotides present in a sequence, use of doubly degenerate base pairing positions overlapping in one nucleotide in their base pairing sets, permits half the number of probes or hybridizing sequences, while the data may be analyzed to yield the same amount of information as if non-degenerately base pairing probes had been used. If six nucleotides are present in the nucleic acid sequence of interest, each label nucleotide sequence comprises, at the position corresponding to the variable position a nucleotide base pairing with four nucleotides, and five probe nucleotide or hybridizing sequences must be employed.

[0025] For example, the invention provides a method for determining a nucleotide at a position of interest in a nucleic acid sequence under conditions suitable for hybridization having a one base pair mismatch stringency, e.g., wherein a single base pair mismatch does not hybridize. The method comprises hybridizing to the target or analyte nucleic acid sequence a first probe comprising a nucleic acid sequence complementary, or complementary except at the probed position or position of interest, to the nucleic acid sequence. It is provided that the position of the first probe corresponding to the position of interest comprises a nucleotide base pairing with a first set of two nucleotides of four present in the nucleic acid sequence, and that a nucleotide present in the nucleic acid sequence is not represented in both the first set and the second set of two or more nucleotides. Under the conditions hybridization of the first probe (and the second probe) to the nucleic acid sequence occurs only if complementarity exists between the nucleotide at the position of interest and the nucleotide at the corresponding position of the first probe. Also employed for hybridizing to the nucleic acid sequence is a second probe comprising a nucleic acid sequence complementary or complementary except at the position of interest, to the nucleic acid sequence. It is provided that the position of the second probe corresponding to the position of interest comprises a nucleotide base pairing with a second set of two or more nucleotides present in the nucleic acid sequence. It is further provided that a nucleotide present in the nucleic acid sequence that is not represented in the second set of two or more nucleotides is not represented in the first set of two or more nucleotides. Also, the first set of two or more nucleotides present in the nucleic acid sequence includes one nucleotide that is a member of the second set of two or more nucleotides present in the nucleic acid sequence, the first set of two or more nucleotides and the second set of two or more nucleotides are not identical, and a nucleotide present in the nucleic acid sequence that is not represented in the first set of two or more nucleotides is not represented in the second base pairing set of two or more nucleotides.

[0026] For four nucleotides two probes per sequence position are required instead of four per sequence position using non-degenerately pairing probe positions, and cumulative information as to the identity of the nucleotide at the position of interest is obtained from the combined data from the first and second probes, both, neither or one or the other pairing with the probed position.

BRIEF DESCRIPTION OF THE DRAWINGS

[0027]FIG. 1 shows the degenerately pairing nucleotide dP.

[0028]FIG. 1A shows the imino form of dP pairing with Adenine (A).

[0029]FIG. 1B shows the amino form of dP pairing with Guanine (G).

[0030]FIG. 2 shows the degenerately pairing nucleotide 8-oxo-G pairing to A in a base pairing interaction resembling a wobble base pair.

[0031]FIG. 3 shows the degenerately pairing nucleotide 8-oxo-G pairing to C in conventional Watson-Crick base pairing interaction substantially the same as a G::C base pairing interaction.

DETAILED DESCRIPTION OF THE INVENTION

[0032] In describing and claiming the present invention, the following terminology will be used in accordance with the definitions set out below.

[0033] The term “adsorb” as used herein refers to the noncovalent retention of a molecule by a substrate surface. That is, adsorption occurs as a result of noncovalent interaction between a substrate surface and adsorbing moieties present on the molecule that is adsorbed. Adsorption may occur through hydrogen bonding, van der Waal's forces, polar attraction or electrostatic forces (i.e., through ionic bonding). Examples of adsorbing moieties include, but are not limited to, amine groups, carboxylic acid moieties, hydroxyl groups, nitroso groups, sulfones and the like. Often the substrate may be functionalized with adsorbent moieties to interact in a certain manner, as when the surface is functionalized with amino groups to render it positively charged in a pH neutral aqueous environment. Likewise, adsorbate moieties may be added in some cases to effect adsorption, as when a basic protein is fused with an acidic peptide sequence to render adsorbate moieties that can interact electrostatically with a positively charged adsorbent moiety.

[0034] The term “attached,” as in, for example, a substrate surface having a moiety “attached” thereto, includes covalent binding, adsorption, and physical immobilization. The terms “binding” and “bound” are identical in meaning to the term “attached.”

[0035] The term “array” used herein refers to a two-dimensional arrangement of features such as an arrangement of reservoirs (e.g., wells in a well plate) or an arrangement of different materials including ionic, metallic or covalent crystalline, including molecular crystalline, composite or ceramic, glassine, amorphous, fluidic or molecular materials on a substrate surface (as in an oligonucleotide or peptidic array). Different materials in the context of molecular materials includes chemical isomers, including constitutional, geometric and stereoisomers, and in the context of polymeric molecules constitutional isomers having different monomer sequences. Arrays are generally comprised of regular, ordered features, as in, for example, a rectilinear grid, parallel stripes, spirals, and the like, but non-ordered arrays may be advantageously used as well. An array is distinguished from the more general term pattern in that patterns do not necessarily contain regular and ordered features. The arrays or patterns formed using the devices and methods of the invention have no optical significance to the unaided human eye. For example, the invention does not involve ink printing on paper or other substrates in order to form letters, numbers, bar codes, figures, or other inscriptions that have optical significance to the unaided human eye. In addition, arrays and patterns formed by the deposition of ejected droplets on a surface as provided herein are preferably substantially invisible to the unaided human eye. Arrays typically but do not necessarily comprise at least about 4 to about 10,000,000 features, generally in the range of about 4 to about 1,000,000 features.

[0036] The terms “biomolecule” and “biological molecule” are used interchangeably herein to refer to any organic molecule, whether naturally occurring, recombinantly produced, or chemically synthesized in whole or in part, that is, was or can be a part of a living organism, or synthetic analogs of molecules occurring in living organisms including nucleic acid analogs having peptide backbones and purine and pyrimidine sequence, carbamate backbones having side chain sequence resembling peptide sequences, and analogs of biological molecules such as epinephrine, GABA, endorphins, interleukins and steroids. The term encompasses, for example, nucleotides, amino acids and monosaccharides, as well as oligomeric and polymeric species such as oligonucleotides and polynucleotides, peptidic molecules such as oligopeptides, polypeptides and proteins, saccharides such as disaccharides, oligosaccharides, polysaccharides, mucopolysaccharides or peptidoglycans (peptido-polysaccharides) and the like. The term also encompasses two different biomolecules linked together, for example a hybridization probe or adapter linked to the green fluorescent protein, or another luminescent molecule including a chemiluminescent molecule. The term also encompasses synthetic GABA analogs such as benzodiazepines, synthetic epinephrine analogs such as isoproterenol and albuterol, synthetic glucocorticoids such as prednisone and betamethasone, and synthetic combinations of naturally occurring biomolecules with synthetic biomolecules, such as theophylline covalently linked to betamethasone.

[0037] The term “biomaterial” refers to any material that is biocompatible, i.e., compatible with a biological system comprised of biological molecules as defined above.

[0038] The terms “library” and “combinatorial library” are used interchangeably herein to mean a plurality of chemical or biological moieties. Such moieties my be present in separate containers, including an array of well plate wells, or present on the surface of a substrate such as attached to discrete beads which may be arrayed, or wherein each moiety is present attached or not attached arrayed on a substrate surface with or without physical or spatial barriers separating one discrete region having an individual moiety from another so long as each moiety is different from each other moiety. The moieties may be, e.g., peptidic molecules and/or oligonucleotides.

[0039] The term “moiety” refers to any particular composition of matter, e.g., a molecular fragment, an intact molecule (including a monomeric molecule, an oligomeric molecule, and a polymer), or a mixture of materials (for example, an alloy or a laminate).

[0040] It will be appreciated that, as used herein, the terms “nucleoside” and “nucleotide” refer to nucleosides and nucleotides containing not only the conventional purine and pyrimidine bases, i.e., adenine (A), thymine (T), cytosine (C), guanine (G) and uracil (U), but also protected forms thereof, e.g., wherein the base is protected with a protecting group such as acetyl, difluoroacetyl, trifluoroacetyl, isobutyryl or benzoyl, and purine and pyrimidine analogs. Suitable analogs will be known to those skilled in the art and are described in the pertinent texts and literature. Common analogs include, but are not limited to, 1-methyladenine, 2-methyladenine, N⁶-methyladenine, N⁶-isopentyladenine, 2-methylthio-N⁶-isopentyladenine, N,N-dimethyladenine, 8-bromoadenine, 2-thiocytosine, 3-methylcytosine, 5-methylcytosine, 5-ethylcytosine, 4-acetylcytosine, 1-methylguanine, 2-methylguanine, 7-methylguanine, 2,2-dimethylguanine, 8-oxoguanine (8-oxo-G), 8-bromoguanine, 8-chloroguanine, 8-aminoguanine, 8-methylguanine, 8-thioguanine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 5-ethyluracil, 5-propyluracil, 5-methoxyuracil, 5-hydroxymethyluracil, 5-(carboxyhydroxymethyl)uracil, 5-(methylaminomethyl)uracil, 5-(carboxymethylaminomethyl)-uracil, 2-thiouracil, 5-methyl-2-thiouracil, 5-(2-bromovinyl)uracil, uracil-5-oxyacetic acid, uracil-5-oxyacetic acid methyl ester, pseudouracil, 1-methylpseudouracil, 6-(beta-d-ribofuranosyl)-3, 4-dihydro-8H-pyrimido[4,5-c]-[1,2]oxazin-7-one (P), queosine, inosine, 1-methylinosine, hypoxanthine, xanthine, 2-aminopurine, 6-hydroxyaminopurine, 6-thiopurine and 2,6-diaminopurine. In addition, the terms “nucleoside” and “nucleotide” include those moieties that contain not only conventional ribose and deoxyribose sugars, but other sugars as well. Modified nucleosides or nucleotides also include modifications on the sugar moiety, e.g., wherein one or more of the hydroxyl groups are replaced with halogen atoms or aliphatic groups, or are functionalized as ethers, amines, or the like.

[0041] As used herein, the term “oligonucleotide” shall be generic to polydeoxynucleotides (containing 2-deoxy-D-ribose), to polyribonucleotides (containing D-ribose), to any other type of polynucleotide that is an N-glycoside of a purine or pyrimidine base, and to other polymers containing nonnucleotidic backbones (for example PNAs), providing that the polymers contain nucleobases in a configuration that allows for base pairing and base stacking, such as is found in DNA and RNA. Thus, these terms include known types of oligonucleotide modifications, for example, substitution of one or more of the naturally occurring nucleotides with an analog, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoramidates, carbamates, etc.), with negatively charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), and with positively charged linkages (e.g., aminoalklyphosphoramidates, aminoalkylphosphotriesters), those containing pendant moieties, such as, for example, proteins (including nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.). There is no intended distinction in length between the term “polynucleotide” and “oligonucleotide,” and these terms will be used interchangeably, but don not include monomers, thus a minimum length of two nucleotides is contemplated by these terms. These terms refer only to the primary structure of the molecule. As used herein the symbols for nucleotides and polynucleotides are according to the IUPAC-IUB Commission of Biochemical Nomenclature recommendations (Biochemistry 9:4022, 1970).

[0042] The term “substrate” as used herein refers to any material having a surface onto which one or more fluids may be deposited. The substrate may be constructed in any of a number of forms such as wafers, slides, well plates, membranes, for example. In addition, the substrate may be porous or nonporous as may be required for any particular fluid deposition. Suitable substrate materials include, but are not limited to, supports that are typically used for solid phase chemical synthesis, e.g., polymeric materials (e.g., polystyrene, polyvinyl acetate, polyvinyl chloride, polyvinyl pyrrolidone, polyacrylonitrile, polyacrylamide, polymethyl methacrylate, polytetrafluoroethylene, polyethylene, polypropylene, polyvinylidene fluoride, polycarbonate, divinylbenzene styrene-based polymers), agarose (e.g., Sepharose®), dextran (e.g., Sephadex®), cellulosic polymers and other polysaccharides, silica and silica-based materials, glass (particularly controlled pore glass, or “CPG”) and functionalized glasses, ceramics, and such substrates treated with surface coatings, e.g., with microporous polymers (particularly cellulosic polymers such as nitrocellulose and spun synthetic polymers such as spun polyethylene), metallic compounds (particularly microporous aluminum), or the like. While the foregoing support materials are representative of conventionally used substrates, it is to be understood that the substrate may in fact comprise any biological, nonbiological, organic and/or inorganic material, and may be in any of a variety of physical forms, e.g., particles, strands, precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, slides, and the like, and may further have any desired shape, such as a disc, square, sphere, circle, etc. The substrate surface may or may not be flat, e.g., the surface may contain raised or depressed regions. A substrate may additionally contain or be derivatized to contain reactive functionality that covalently links a compound to the surface thereof. These are widely known and include, for example, silicon dioxide supports containing reactive Si—OH groups, polyacrylamide supports, polystyrene supports, polyethyleneglycol supports, and the like.

[0043] The term “surface modification” as used herein refers to the chemical and/or physical alteration of a surface by an additive or subtractive process to change one or more chemical and/or physical properties of a substrate surface or a selected site or region of a substrate surface. For example, surface modification may involve (1) changing the wetting properties of a surface, (2) functionalizing a surface, i.e., providing, modifying or substituting surface functional groups, (3) defunctionalizing a surface, i.e., removing surface functional groups, (4) otherwise altering the chemical composition of a surface, e.g., through etching, (5) increasing or decreasing surface roughness, (6) providing a coating on a surface, e.g., a coating that exhibits wetting properties that are different from the wetting properties of the surface, and/or (7) depositing particulates on a surface.

[0044] The phrase “base pairing” as used in this application is intended to encompass all manner of specific pairings between the bases that make up nucleic acid sequences. Specifically contemplated are the most typically observed Watson-Crick base pairings between antiparallel sequences in which the pairing scheme is {[A::T or U], [G::C]} with the former pairing being stabilized by two hydrogen bond interactions and the latter being stabilized by three H bonds. Also encompassed are Hoogstein, triplex and wobble base pairing interactions, and the like. The base pairing may be between two free nucleotides or nucleosides, or a free nucleotide or nucleoside and a position of a nucleic acid sequence, or between nucleic acid sequences at individual corresponding positions of a nucleic acid hybridized structure.

[0045] The adjectival term “hybridized” refers to two or more sequentially adjacent base pairings. The term “hybridization” refers to the process by which sequences become hybridized. The verb to hybridize and gerund form hybridizing refer to experimental or attempted hybridization by contacting nucleic acid sequences under conditions suitable for hybridization.

[0046] The term “complementarity” or “complementary” as used in this application denotes the capacity for cumulative base pairing between nucleic acid sequences at individual corresponding positions, as in a nucleic acid hybridized structure. “Complete” or “perfect” complementarity describes a stabilizing base pairing interaction at each sequence position to corresponding sequence position in a nucleotide sequence. “Partial” complementarity describes nucleic acid sequences that do not base pair at each position. A single mismatch partial complementarity refers to sequences that base pair at every position but one.

[0047] The phrase “complementarity set” or “base pairing set” as used in this application refers to the set of nucleotides that base pair to an analyte, probed or target nucleic acid sequence at a variable or probed position or position of interest. The complementary sequence therefore may comprise at the position corresponding to the position of interest or variable position of the analyte sequence any of the members of the complementary set. The complementarity or base pairing set includes nucleotides that are complementary in the context of single stranded sequence hybridization and/or incoming nucleotide base pairing for nucleic acid polymerase synthesis from a template. The phrase complementarity set used in reference to a sequence of two or more nucleotides refers to the set of all the sequences that are complementary, capable of hybridizing, to that sequence.

[0048] The phrase “overlapping complementarity sets” refers to complementarity sets that have one or more nucleotides in common, or one or more sequences in common. Unique complementarity sets will not completely overlap, with such sets related as set/subset or partial overlap relationship. Examples of overlapping complementarity sets having partial overlap are the complementarity sets of the degenerately pairing nucleotide analogs dPTP (complementarity set: {A, G}) and 8-oxo-dGTP (complementarity set: {A, C}). Thus the complementarity sets of P and 8-oxo-G are unique and of the partial overlap type, having a common base A an excluded base T. Each set has a non-common or unique base, for P, G and for 8-oxo-G, C are the respective unique bases in the partially overlapping complementarity sets.

[0049] The phrase “hybridizing conditions” or “conditions suitable for hybridization” or like phrases used herein contemplates those conditions necessary for hybridization, e.g. those conditions appropriate to permit hybridization of nucleic acids taught by the invention. The specific chemical and physical conditions appropriate, suitable or effective for hybridization as practiced in the invention are known or ascertainable by those of skill in the art of nucleic acid detection and assay. Conditions suitable for hybridization include a range of conditions adequate for forming any hybridized nucleic acid species required for hybridization by the methods of the invention, and include a range of hybridization conditions having various stringencies. Thus the range of hybridization conditions includes conditions effecting high, medium and low stringency nucleic acid hybridization including a stringency sufficient to preclude formation of significant amounts of double stranded complementary structures in a given length of sequence for one, internal or external mismatch. Less stringent hybridization conditions are capable of discerning only a greater sequence mismatch using hybridization.

[0050] The term “hybridization probe” as used herein refers to a nucleic acid sequence that by itself or as a member of a set of nucleic acid sequences or probes for a specific nucleic acid sequence, effects the hybridization of a specific target sequence. The hybridization probes of the invention comprise a nucleic acid sequence segment having sequence complementary to the analyte sequence of interest. Such probes may comprise nucleic acid sequence for potential hybridization with analyte only, or may additionally comprise and a discrete tagging or labeling moiety, such as a chemiluminescent moiety or a discrete nucleic acid sequence that is not a putative anti-target or anti-analyte sequence, but functions solely to indicate the presence of the probe. Such hybridization probes include sequences that form hybrids for enzymatic amplification such as primers for polymerase chain reaction amplification and sequences forming double stranded complex replication templates for enzymes such as the RNA replicases. In addition to hybridizing probes for an amplification process, probes for simple hybridization and detection, both tagged or labeled with a discrete moiety and not labeled with any discrete label moiety are contemplated. Nucleic acid sequences comprising probes not having a discrete labeling moiety may be intrinsically labeled for detection of the hybridization, as by incorporation of ³²P into the nucleic acid phosphodiester backbone or the like. Hybridization probes may comprise a sequence complementary to the sequence to be detected and detectable signal or marker indicating the presence of the complementary sequence, for example a separate moiety such as a chemiluminescent marker, or ³²P incorporated into the phosphodiester backbone of the nucleic acid sequence or both.

[0051] The phrase “analyte sequence” or “probed sequence” or “target sequence” refers to a nucleic acid sequence that is to be detected.

[0052] Hybridization based procedures are important in amplification and detection of nucleic acid sequences generally, and in amplification and/or detection for sequencing. The amplification of nucleic acids typically employs hybridizing probes such as the primers used in polymerase chain reaction (PCR) (see generally U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis et al.) and the hybridizing probes that are used to achieve amplification of such probes when they form a complex template substrate for an RNA replicase enzyme (see generally U.S. Pat. No. 4,786,600 to Kramer; U.S. Pat. No. 5,407,798 to Martinelli et al. and U.S. Pat. No. 6,090,589 to Dimond et al.). The methods that employ RNA replicases obtain amplification by the amplification of an amplification probe sequence rather than by direct amplification of a segment or segments of target nucleic acid analyte. Methods based upon PCR specifically amplify a target sequence that requires a specific hybridized primer. Consequently, the RNA replicase amplification probe or probes employed must be carefully designed to both form the correct complex template required by the replicase enzyme, and to effect amplification of the correct probe. Analogously PCR probes must also be carefully designed to effectuate the amplification of the probed for sequence.

[0053] The use of hybridization for sequencing involves either direct use of hybridization data, wherein the sequence of an unknown or analyte sequence is obtained by hybridization to known nucleic acid sequences with overlapping sequence under conditions that permit no mismatches in base pairing (U.S. Pat. Nos. 5,492,806, 5,525,464 and 5,695,940 to Drmanac et al.).

[0054] Sequencing by hybridization (SBH) of a target nucleic acid may be described as a two step process: (i) disassembling the target nucleic acid into all its constituent oligonucleotides of length N (N-mers); and (ii) the deduction of the sequence by assembly of N-mers detected by hybridization in a sequential N-mer arrangement indicated by sequence overlap into an extended sequence. In classical SBH of this type, hybridization of all possible N-mer oligonucleotide hybridization probes to the target nucleic acid determines the N-mer oligonucleotide subset contained in the primary sequence of the target nucleic acid and is the first step in the process. The methods and partially overlapping degenerate base pairing positions of the nucleic acid sequences of the invention permit, for nucleic acids having four possible nucleotides, permit employment of half the number of probes as the number of N-mers.

[0055] For example, for 8-mers, 4⁸ possible sequences exist but the invention permits using only 2*4⁷ sequences for obtaining the sequence. For a single variable position per hybridization probe, 4⁷ possible sequences exist not including the variable position, and the variable position has two possible partially overlapping degenerate base pairing or complementary sets, thus permitting 2*4⁷ possible probes. If two variable positions are employed 4⁶ possible sequences exist for positions that are not variable, and each variable position has two possible partially overlapping degenerate base pairing or complementary sets, thus permitting 2*2*4⁶ (4⁷) possible probes. However, use of two variable positions complicates both data acquisition and analysis, as base pairing or the lack thereof must be independently detected and analyzed; for example in addition to high stringency hybridization where a single mismatch precludes hybridization experiments permitting single mismatch hybridization but prohibiting double mismatch hybridization must be employed with the additional capacity to determine at which variable position the single mismatch occurs would be required for data acquisition and the data analysis would be twice as computationally complex. This could, for example, be obtained by employing a probe having a terminal and internal variable position in conjunction with calorimetric methods, such as differential scanning calorimetry (DSC).

[0056] The preceding SBH methods may be practiced by employing an array of hybridization probes attached to a substrate surface, or an array of separate beads. Or, the beads or free (unattached) hybridization probes may be present in a plurality of different assay containers either arrayed in well plate wells, or the containers may be discrete. Integrated infrared video imaging with integration for discrete array sites may conveniently be employed to detect hybridization and to differentiate different stabilization energies for example in two variable position hybridization probes.

[0057] A nucleic acid fragment can be deconstructed into all constituent oligonucleotides. Positively hybridizing N-mer oligonucleotide probes are sequentially ordered and the sequence of the analyte DNA is determined using (N-l)mer overlapping frames between the oligonucleotide probes.

[0058] The sequence is deduced by reassembly of the sequence of known (N-1)-mer overlapping oligonucleotides that hybridize to the target nucleic acid to generate the sequence of the target nucleic acid, which cannot be accomplished in some cases because some information is lost if the target nucleic acid is not in fragments of appropriate in relation to the size of oligonucleotide that is used for hybridization probes. The quantity of information lost is proportional to the length of a target being sequenced. However, if sufficiently short targets are employed, their sequence can be unambiguously determined. The deductive construction of the sequence is interrupted in analyte sequence regions where a given overlapping (N-1)-mer is duplicated to appear at least three times in succession, e.g. repeated two or more times, causing the deduced sequence to skip the second and subsequent repetitions in sequence. At such points either of two different N-mers, differing in the last nucleotide are deduced for extending the sequence construction. Such branching points of sequence deduction limit unambiguous assembly of sequence.

[0059] The probabilistic distribution frequency of such duplicated sequences, that interfere with sequence deduction, for a certain length of DNA can be calculated. As sequence motifs and patterns are not completely random in their distribution among species and between types of sequence, it will be readily appreciated that often the best approach for calculating this probabilistic distribution frequency will be a species-specific genomic heuristic bioinformatics approach. The derivation of a probabilistic distribution frequency function requires a parameter pertaining to sequence organization termed in the art the sequence subfragment (SF).

[0060] As defined in the art, a sequence subfragment exists if any part of the sequence of a target nucleic acid starts and ends with an (N-1)-mer that is repeated two or more times within the target or analyte sequence. Thus, subfragments are sequences generated between two points of branching in the process of assembly of the sequences in the method of the invention. As defined to include the short double or greater repeat, the sum of lengths of all subfragments is longer than the actual target nucleic acid because of overlapping short ends. Generally, subfragments cannot be assembled in a linear order without additional information since they can possibly have the same repeated (N-1)-mers at their ends and starts. Different numbers of subfragments are obtained for each nucleic acid target depending on the number of doubly repeated (N-1)-mers. Their number depends on the value of N-1, the length of the target and the type and species of derivation of the nucleic acid sequence. Sequence “type” is intended to denote intron, exon and regulatory sequence of genomic nucleic acid, and distinctions between conventional genomic and mRNA transcript sequence, and viral genomic transcript and reverse transcriptase transcript.

[0061] Thus for the analyte sequence (the ribonucleotide U and the deoxyribonucleotide T are used interchangeably for base pairing purposes) 5′-ATAAAGCTGCTTC (SEQ ID. NO. 1) (having no subfragments) will hybridize only to beads or array sites having the 5-mers

1 50 1 13 DNA Artificial Sequence Description of Artificial Sequence Analyte sequence 1 ataaagctgc ttc 13 2 23 DNA Artificial Sequence Description of Artificial Sequence Labeling structure 2 aaaaaaaaac ccccttttct ttt 23 3 23 DNA Artificial Sequence Description of Artificial Sequence Labeling structure 3 aaaaaaaaac cccctttttt ttt 23 4 23 DNA Artificial Sequence Description of Artificial Sequence Labeling structure 4 aaaagaaaac ccccttttct ttt 23 5 14 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 5 acgagctgcc agtc 14 6 14 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 6 gactggcagc tcga 14 7 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 7 nnnnacgagc tgccagtcca tttaggcg 28 8 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 8 nnngacgagc tgccagtccg ctttgtag 28 9 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 9 nnnnacgagc tgccagtcgg aacctgaa 28 10 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 10 nngnacgagc tgccagtcat tcctcctc 28 11 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 11 nnnnacgagc tgccagtccg aagaagtc 28 12 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 12 ngnnacgagc tgccagtcgg cgataact 28 13 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 13 nnnnacgagc tgccagtcgc atccatct 28 14 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 14 gnnnacgagc tgccagtcgc cagtgtta 28 15 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 15 nnnyacgagc tgccagtcca tttaggcg 28 16 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 16 nnnkacgagc tgccagtccg ctttgtag 28 17 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 17 nnynacgagc tgccagtcgg aacctgaa 28 18 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 18 nnknacgagc tgccagtcat tcctcctc 28 19 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 19 nynnacgagc tgccagtccg aagaagtc 28 20 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 20 nknnacgagc tgccagtcgg cgataact 28 21 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 21 ynnnacgagc tgccagtcgc atccatct 28 22 28 DNA Artificial Sequence Description of Artificial Sequence Adaptor sequence 22 knnnacgagc tgccagtcgc cagtgtta 28 23 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 23 catttaggcg 10 24 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 24 ggaacctgaa 10 25 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 25 cgaagaagtc 10 26 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 26 gcatccatct 10 27 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 27 cgcctaaatg 10 28 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 28 ttcaggttcc 10 29 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 29 gacttcttcg 10 30 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 30 agatggatgc 10 31 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 31 cgctttgtag 10 32 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 32 attcctcctc 10 33 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 33 ggcgataact 10 34 10 DNA Artificial Sequence Description of Artificial Sequence Decoder binding sequence 34 gccagtgtta 10 35 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 35 ctacaaagcg 10 36 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 36 gaggaggaat 10 37 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 37 agttatcgcc 10 38 10 DNA Artificial Sequence Description of Artificial Sequence Decoder probe sequence 38 taacactggc 10 39 21 DNA Homo sapiens 39 tttgtgaatg aggccgcata t 21 40 21 DNA Homo sapiens 40 atatgcggcc tcattcacaa a 21 41 21 DNA Artificial Sequence Description of Artificial Sequence Primer 41 atatgcggcc bcattcacaa a 21 42 21 DNA Artificial Sequence Description of Artificial Sequence Primer 42 atatgcggcc ncattcacaa a 21 43 21 DNA Artificial Sequence Description of Artificial Sequence Primer 43 atatgcggcc gcattcacaa a 21 44 21 DNA Artificial Sequence Description of Artificial Sequence Primer 44 atatgcggcc ycattcacaa a 21 45 21 DNA Artificial Sequence Description of Artificial Sequence Primer 45 atatgcggcc kcattcacaa a 21 46 21 DNA Artificial Sequence Description of Artificial Sequence Primer 46 atatgcggcc ccattcacaa a 21 47 21 DNA Artificial Sequence Description of Artificial Sequence Primer 47 atatgcggcc gcattcacaa a 21 48 21 DNA Artificial Sequence Description of Artificial Sequence Primer 48 atatgcggcc rcattcacaa a 21 49 21 DNA Artificial Sequence Description of Artificial Sequence Primer 49 atatgcggcc scattcacaa a 21 50 21 DNA Artificial Sequence Description of Artificial Sequence Probe for the ORF 854 mutation 50 atatgcggcc acattcacaa a 21 

We claim:
 1. A method of employing oligonucleotide probes to obtain information on a target nucleic acid analyte containing a target sequence segment, the method comprising: contacting the analyte, under hybridizing conditions, with at least two oligonucleotide probes, each oligonucleotide probe comprising a sequence segment complementary, or complementary except at a position corresponding to a probed position of the target sequence, wherein a nucleotide at the position of each oligonucleotide probe corresponding to the probed position is capable base pairing with a set of two or more nucleotides, each set is unique but includes one nucleotide common to all the sets, and one nucleotide that may be present in the target sequence segment is not represented in any set, further wherein hybridization of each oligonucleotide probe to the target sequence segment under the hybridizing conditions occurs only if no mismatch exists at the probed position, such that depending upon the identity of the nucleotide at the probed position of the target sequence segment, all, some or none of the oligonucleotide probes hybridize to the target nucleic acid sequence.
 2. The method of claim 1 wherein four nucleotides may be present in the target sequence segment and each oligonucleotide probe comprises, at the position corresponding to the probed position a nucleotide base pairing with two nucleotides.
 3. The method of claim 1 wherein more than four nucleotides may be present in the target sequence segment, each oligonucleotide probe comprises, at the position corresponding to the probed position a nucleotide base pairing with more than two nucleotides, and more than two oligonucleotide probes are employed.
 4. The method of claim 3 wherein five nucleotides may be present in the target sequence segment, three oligonucleotide probes are employed, each oligonucleotide probe comprises, at the position corresponding to the variable position a nucleotide base pairing with two or three nucleotides, and each set has at least one nucleotide not common to all the sets in common with another set.
 5. The method of claim 1 wherein a null hybridizing sequence comprising a nucleic acid sequence, complementary to the nucleic acid sequence of interest, having the nucleotide represented in neither the first or second set of two or more nucleotides at the variable position is employed.
 6. The method of claim 1 employed as a sequencing method.
 7. The method of claim 6 wherein the sequencing method is by analysis of hybridization data obtained from an array of oligonucleotide probes.
 8. The method of claim 7 wherein the array comprises arrayed individual beads or particles, each bead or particle having a surface to which is attached a plurality of oligonucleotide probes of identical sequence.
 9. The method of claim 7 wherein the array comprises a substrate having a surface, the surface having a plurality of discrete surface sites, each site having attached a plurality of oligonucleotide probes of identical sequence.
 10. The method of claim 7 wherein detection of a target sequence segment hybridizing to an oligonucleotide probe is by detection of a discrete label moiety linked to the target sequence segment.
 11. The method of claim 10 wherein the discrete label moiety linked to the target sequence segment comprises a nucleic acid sequence.
 12. The method of claim 10 wherein the discrete label moiety linked to the target sequence segment comprises a luminescent moiety.
 13. The method of claim 12 wherein the luminescent moiety is a chemiluminescent or fluorescent moiety.
 14. The method of claim 9 wherein detection of a target sequence segment hybridizing to an oligonucleotide probe is by detection of an label intrinsic to the target sequence segment.
 15. The method of claim 14 wherein the label intrinsic to the target sequence segment is ³²P.
 16. The method of claim 7 wherein detection of a target sequence segment hybridizing to an oligonucleotide probe is by detection of the heat of hybridization.
 17. The method of claim 6 wherein the sequencing method is by detection of labels that attach to by hybridization to the target sequence segment.
 18. The method of claim 1 wherein hybridized target nucleic acids are amplified by a polymerase enzyme that requires a hybridized complex for polymerizing the formation of nucleic acid sequence.
 19. The method of claim 18 wherein hybridized nucleic acids are amplified by polymerase chain reaction.
 20. The method of claim 19 wherein hybridized nucleic acids are amplified by an RNA replicase enzyme.
 21. The method of claim 19 for genetic analysis.
 22. The method of claim 21 employed for allelic analysis.
 23. The method of claim 21 wherein genomic DNA is analyzed.
 24. The method of claim 21 wherein genomic cDNA is analyzed.
 25. The method of claim 2 wherein the four nucleotides are A, T, C and G.
 26. The method of claim 4 wherein the five nucleotides are A, T, C, G and I (Inosine).
 27. The method of 6 wherein the sequencing method is by analysis of hybridization data obtained from an array of target nucleic acid analyte sequences attached to a substrate surface.
 28. The method of 14 wherein the array comprises arrayed individual beads or particles, each bead or particle having a surface, the surface having attached a plurality of target nucleic acid analyte sequences having an identical sequence.
 29. The method of 14 wherein the array comprises arrayed discrete sites on a substrate surface of an integrated substrate, each site having a surface, the surface having attached a plurality of target nucleic acid analyte sequences having an identical sequence.
 30. The method of claim 14 wherein each oligonucleotide probe sequence additionally comprises a linker moiety and a label moiety.
 31. The method of claim 15 wherein the linker moiety comprises a common nucleic acid sequence and the label moiety comprises a signature nucleic acid sequence that identifies the target sequence segment.
 32. The method of claim 16 wherein the common nucleic acid sequence is double stranded.
 33. The method of claim 17 wherein decoder labels comprising a nucleic aid sequence complementary to the signature sequence and a second label moiety are employed to image the array.
 34. The method of claim 18 wherein the second label moiety comprises a luminescent moiety.
 35. The method of claim 34 wherein the luminescent moiety is a fluorescent or chemiluminescent moiety.
 36. The method of claim 14 wherein the substrate surface is functionalized with a surface modification to enhance hybridization.
 37. The method of claim 36 wherein the enhancement is increasing stringency or kinetics of hybridization.
 38. The method of claim 14 wherein the electric potential at the substrate surface is electronically controlled to enhance hybridization.
 39. The method of claim 29 wherein the integrated substrate comprises a semiconductor chip comprising electronic circuitry, wherein the electric potential at the individual array sites of the substrate surface is independently electronically controlled to enhance of hybridization.
 40. A method for analysis of a plurality of target nucleic acid sequences, using label nucleotide sequences, by base pairing complementarity under hybridizing conditions that reduce the number of label nucleotide sequences required to uniquely label each of the target nucleotide sequences, the method comprising: a) contacting each target nucleic acid sequence with a collection of label nucleotide sequences each comprising a label moiety and an anti-target sequence segment complementary, or complementary except at one variable position, to a target sequence segment of the target nucleic acid sequence, wherein the position of each anti-target sequence segment corresponding to the variable position comprises a nucleotide base pairing with a set of two or more nucleotides present in the plurality of nucleic acid sequences, each set is different from each other set, each set includes at least one nucleotide in common with each other set, and a nucleotide present in the plurality of nucleic acid sequences is not represented in any set, and hybridization of the anti-target sequence segment of each label nucleotide sequence to a target sequence segment under the conditions occurs only if complementarity exists between the nucleotide at the variable position and the nucleotide at the corresponding position of each label nucleotide; and b) detecting which of the label nucleotide sequences hybridize to each target nucleic acid sequence, wherein depending upon the identity of the nucleotide at the variable position of the target sequence segment, some, all or none of the label nucleotide sequences employed hybridize to each target nucleotide sequence.
 41. The method of claim 40 wherein four nucleotides are present in the sequence of interest and, two label nucleotide sequences are employed for each target sequence segment, and each anti-target sequence segment comprises, at the position corresponding to the variable position of the target sequence segment, a nucleotide base pairing with two nucleotides.
 42. The method of claim 40 wherein more than four nucleotides are present in the sequence of interest, each anti-target sequence segment comprises, at the position corresponding to the variable position of the anti-target sequence, a nucleotide base pairing with more than two nucleotides, and the collection of label nucleotide sequences includes more than two label nucleotide sequences.
 43. The method of claim 42 wherein five nucleotides may be present in the target sequence segment, three label nucleotide sequences are employed, each label nucleotide sequence comprises, at the position corresponding to the variable position of the anti-target sequence, a nucleotide base pairing with two or three nucleotides and each set has at least one nucleotide not common to all the sets in common with another set.
 44. The method of claim 40 wherein a null label nucleotide sequence is employed, the null label nucleotide sequence comprising a nucleic acid sequence complementary to a target segment of a target nucleic acid sequence, having the nucleotide not represented in any set of two or more nucleotides at the variable position.
 45. The method of claim 40 employed as a sequencing method.
 46. The method of claim 45 wherein the sequencing method is by analysis of hybridization data obtained from an array of label nucleotide sequences.
 47. The method of claim 46 wherein the array comprises arrayed individual beads or particles, each bead or particle having a surface to which is attached a plurality of label nucleotide sequences having an identical sequence.
 48. The method of claim 46 wherein the array comprises a substrate having a surface, the surface having a plurality of discrete surface sites, each site having attached a plurality of label nucleotide sequences having an identical sequence.
 49. The method of claim 46 wherein detection of a target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of a discrete analyte label moiety linked to the target nucleic acid sequence.
 50. The method of 49 wherein identification of the label nucleotide sequence to which a target nucleic acid sequence hybridizes is by identification of the label moiety of the label nucleotide sequence.
 51. The method of claim 49 wherein the analyte label moiety linked to the target nucleic acid sequence comprises a nucleic acid sequence.
 52. The method of claim 49 wherein the discrete analyte label moiety linked to the target nucleic acid sequence comprises a luminescent moiety.
 53. The method of claim 52 wherein the luminescent moiety is a chemiluminescent or fluorescent moiety.
 54. The method of claim 53 wherein the discrete analyte label moiety of the target nucleic acid sequence and the label moiety of the label nucleotide sequence are both fluorescent and capable of fluorescence resonance energy transfer, and detection of hybridization of label nucleotide sequence and target nucleic acid sequence is by detecting fluorescence resonance energy transfer.
 55. The method of claim 47 wherein detection of a target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of an analyte label intrinsic to the target nucleic acid sequence of interest.
 56. The method of claim 40 wherein a discrete analyte label moiety is linked to each target nucleic acid sequence, identification of the label nucleotide sequence hybridizing to a target nucleic acid sequence is by detection of the label moiety, and identification of the target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of the discrete analyte label moiety linked to the target nucleic acid sequence.
 57. The method of claim 55 wherein the label intrinsic to the target nucleic acid sequence is ³²P.
 58. The method of claim 46 wherein detection of a target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of the heat of hybridization.
 59. The method of claim 58 wherein a discrete analyte label moiety is linked to each target nucleic acid sequence, and identification of the target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of the discrete analyte label moiety linked to the target nucleic acid sequence.
 60. The method of claim 45 wherein the sequencing method is by detection of label nucleotide sequences that attach by hybridization to the target nucleic acid sequence.
 61. The method of claim 60 wherein a discrete analyte label moiety is linked to each target nucleic acid sequence, identification of the label nucleotide sequence hybridizing to a target nucleic acid sequence is by detection of the label moiety, and identification of the target nucleic acid sequence hybridizing to a label nucleotide sequence is by detection of the discrete analyte label moiety linked to the target nucleic acid sequence.
 62. The method of claim 61 wherein successive target segments in each target analyte sequence are exposed for hybridization to label nucleotide sequences by endonuclease digestion.
 63. The method of claim 40 wherein hybridized target nucleic acid sequences are amplified by a polymerase enzyme that requires a hybridized complex for polymerizing the formation of nucleic acid sequence.
 64. The method of claim 63 wherein hybridized target nucleic acid sequences are amplified by polymerase chain reaction.
 65. The method of claim 63 wherein hybridized target nucleic acid sequences are amplified by an RNA replicase enzyme.
 66. The method of claim 63 employed for genetic analysis.
 67. The method of claim 66 employed for allelic analysis.
 68. The method of claim 66 wherein genomic DNA is analyzed.
 69. The method of claim 66 wherein genomic cDNA is analyzed.
 70. The method of claim 41 wherein the four nucleotides are A, T, C and G.
 71. The method of claim 43 wherein the five nucleotides are A, T, C, G and I (Inosine).
 72. The method of 45 wherein the sequencing method is by analysis of hybridization data obtained from an array of target nucleic acid sequences attached to a substrate surface.
 73. The method of claim 72 wherein a discrete analyte label moiety is linked to each target nucleic acid sequence, identification of the label nucleotide sequence hybridizing to a target nucleic acid sequence is by detection of the label moiety, and identification of the target nucleic acid sequence attached to the substrate surface is by detection of the discrete analyte label moiety linked to the target nucleic acid sequence.
 74. The method of 72 wherein the array comprises arrayed individual beads or particles, each bead or particle having a surface, the surface having attached a plurality of target nucleic acid sequences having an identical sequence.
 75. The method of 72 wherein the array comprises arrayed discrete sites on a substrate surface of an integrated substrate, each site having a surface, the surface having attached a plurality of target nucleic acid sequences having an identical sequence.
 76. The method of claim 72 wherein each label nucleotide sequence additionally comprises a linker moiety.
 77. The method of claim 76 wherein the linker moiety comprises a common nucleic acid sequence and the label moiety comprises a signature nucleic acid sequence that identifies the target sequence segment.
 78. The method of claim 77 wherein the common nucleic acid sequence is double stranded.
 79. The method of claim 78 wherein decoder labels comprising a nucleic aid sequence complementary to the signature sequence and a second label moiety are employed to image the array.
 80. The method of claim 79 wherein the second label moiety comprises a luminescent moiety.
 81. The method of claim 80 wherein the luminescent moiety is a fluorescent or chemiluminescent moiety.
 82. The method of claim 81 wherein the luminescent moiety is phycoerythyrin, the double stranded common nucleic acid sequence is 14 nucleotides long the target sequence segment is 4 nucleotides long and the signature sequence is 10 nucleotides long, and successive cycles of hybridization, ligation, detection and endonuclease digestion are employed.
 83. The method of claim 72 wherein the substrate surface is functionalized with a surface modification to enhance hybridization.
 84. The method of claim 83 wherein the enhancement is increasing stringency or kinetics of hybridization.
 85. The method of claim 72 wherein the electric potential at the substrate surface is electronically controlled to enhance hybridization.
 86. The method of claim 75 wherein the integrated substrate comprises a semiconductor chip comprising electronic circuitry, wherein the electric potential at the individual array sites of the substrate surface is independently electronically controlled to enhance hybridization.
 87. A method for determining a nucleotide at a position of interest in an analyte nucleic acid sequence having a target segment under conditions suitable for hybridization, the method comprising: a) contacting the nucleic acid sequence a first probe and a second probe each probe comprising a nucleic acid sequence complementary, or complementary except at the position of interest, to the target segment, wherein the position of the first probe corresponding to the position of interest comprises a nucleotide base pairing with a first set of two or more nucleotides present in the nucleic acid sequence and the position of the second probe corresponding to the position of interest comprises a nucleotide base pairing with a second set of two or more nucleotides present in the nucleic acid sequence, and the first set of two or more nucleotides includes at least one nucleotide that is a member of the second set of two or more nucleotides, the two sets are not identical, and a nucleotide present in the nucleic acid sequence of interest is represented in neither set, and further wherein hybridization of the first probe and the second probe to the target segment under the conditions occurs only if complementarity exists between the nucleotide at the position of interest and the nucleotide at the corresponding position of the first probe and the second probe; b) detecting which of the first probe and the second probe hybridize to the target segment; and c) analyzing the data from step b), wherein cumulative information as to the identity of the nucleotide at the position of interest is obtained from the combined data from the first probe and second probe.
 88. A method for determining a nucleotide at a position of interest in an nucleic acid sequence having a target segment under conditions suitable for hybridization, the method comprising: a) contacting the nucleic acid sequence with a first probe comprising a nucleic acid sequence complementary, or complementary except at the position of interest, to the nucleic acid sequence, wherein the position of the first probe corresponding to the position of interest comprises a nucleotide base pairing with a first set of two or more nucleotides present in the nucleic acid sequence, and that a nucleotide present in the nucleic acid sequence is not represented in the first set of two or more nucleotides, and further wherein hybridization of the first probe to the nucleic acid sequence under the conditions occurs only if complementarity exists between the nucleotide at the position of interest and the nucleotide at the corresponding position of the first probe; b) contacting the nucleic acid sequence with a second probe comprising a nucleic acid sequence complementary, or complementary except at the position of interest, to the nucleic acid sequence, provided that the position of the second probe corresponding to the position of interest comprises a nucleotide base pairing with a second set of two or more nucleotides present in the nucleic acid sequence, a nucleotide present in the nucleic acid sequence is not represented in the second set, the second set is different from the first set, and the first set includes at least one nucleotide that is a member of the second set, and the nucleotide that is not represented in the first set is not represented in the second set, wherein hybridization of the second probe to the nucleic acid sequence under the conditions occurs only if complementarity exists between the nucleotide at the position of interest and the nucleotide at the corresponding position of the second probe; c) detecting whether the first probe and the second probe hybridize to the nucleic acid sequence; and d) analyzing the hybridization data, wherein cumulative information as to the identity of the nucleotide at the position of interest is obtained from the combined data from the first and second probes.
 89. A collection comprised of probe nucleic acid sequence sets, each of the collection of nucleic acid sequence sets having a position corresponding to a probed position of a target nucleic acid sequence, wherein each probed position of nucleic acid sequence set is capable of base pairing to a unique degenerate set of nucleotides, each unique degenerate set of nucleotides has at least one nucleotide in common with each other unique degenerate set of nucleotides, and one nucleotide is commonly excluded from all the unique degenerate sets of nucleotides.
 90. The collection of claim 89 wherein each sequence set consists of a single sequence.
 91. The collection of claim 89 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest, a nucleotide base pairing with two nucleotides, and the collection consists of two probe nucleic acid sequence sets.
 92. The collection of claim 89 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest a nucleotide base pairing with more than two nucleotides, and the collection consists of more than two probe nucleic acid sequences or probe nucleic acid sequence sets.
 93. The collection of claim 92 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest a nucleotide base pairing with three nucleotides, and the collection consists of three probe sequence sets.
 94. The collection of claim 89 in combination with a null probe comprising at the position of interest, complementary to the nucleic acid sequence, having the nucleotide that is commonly excluded from all the unique degenerate sets of nucleotides at the position of interest.
 95. An array comprising the probe nucleic acid sequences of claim 89 arrayed attached to a substrate surface.
 96. The array of claim 95 comprising arrayed individual beads or particles, each bead or particle having a surface to which is attached a plurality of probes having an identical sequence.
 97. The array of claim 95 comprising an integrated substrate having a surface, the surface having a plurality of discrete surface sites, each site having attached a plurality of probe nucleic acid sequences having an identical sequence.
 98. The array of claim 95 wherein each probe nucleic acid sequence additionally comprises a label moiety.
 99. The collection of claim 89 wherein each probe nucleic acid sequence additionally comprises a label moiety.
 100. The collection of claim 89 wherein each probe nucleic acid sequence additionally comprises a linker moiety and a label moiety.
 101. The collection of claim 100 wherein the linker moiety comprises a common nucleic acid sequence and the label moiety comprises a signature nucleic acid sequence that identifies the target sequence segment.
 102. The collection of claim 101 wherein the common nucleic acid sequence is double stranded.
 103. The collection of claim 102 additionally comprising decoders, each decoder comprising a nucleic aid sequence complementary to the signature sequence and a second label moiety.
 104. The collection of claim 103 wherein the second label moiety comprises a luminescent moiety.
 105. The collection of claim 102 wherein the double stranded common nucleic acid sequence is 14 nucleotides long the target segment is 4 nucleotides long and the signature sequence is 10 nucleotides long, and the second label moiety is phycoerythrin.
 106. The array of claim 95 wherein the substrate surface is functionalized with a surface modification to enhance hybridization.
 107. The array of claim 106 wherein the enhancement is increasing stringency or kinetics of hybridization.
 108. The array of claim 95 wherein the electric potential at the substrate surface is electronically controlled to enhance hybridization.
 109. The array of claim 97 wherein the integrated substrate comprises a semiconductor chip comprising electronic circuitry, wherein the electric potential at the individual array sites of the substrate surface is independently electronically controlled to enhance hybridization.
 110. A probe system comprising a pair of probe nucleic acid sequence sets, each of the pair of probe nucleic acid sequence sets having a position corresponding to a probed position of a target nucleic acid sequence, wherein each probed position of each of the pair of probe nucleic acid sequence sets is capable of base pairing to a unique doubly degenerate set of nucleotides, each doubly degenerate set of nucleotides sharing a single common nucleotide.
 111. The system of claim 110 wherein each sequence set consists of a single sequence.
 112. The system of claim 110 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest, a nucleotide base pairing with two nucleotides, and the collection consists of two probe nucleic acid sequence sets.
 113. The system of claim 110 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest a nucleotide base pairing with more than two nucleotides, and the collection consists of more than two probe nucleic acid sequences or probe nucleic acid sequence sets.
 114. The probe nucleic acid sequences of claim 113 wherein each probe nucleic acid sequence comprises, at the position corresponding to the position of interest a nucleotide base pairing with three nucleotides, and the collection consists of three probe sequence sets. 